16 research outputs found

    An Open Framework for Extensible Multi-Stage Bioinformatics Software

    Get PDF
    In research labs, there is often a need to customise software at every step in a given bioinformatics workflow, but traditionally it has been difficult to obtain both a high degree of customisability and good performance. Performance-sensitive tools are often highly monolithic, which can make research difficult. We present a novel set of software development principles and a bioinformatics framework, Friedrich, which is currently in early development. Friedrich applications support both early stage experimentation and late stage batch processing, since they simultaneously allow for good performance and a high degree of flexibility and customisability. These benefits are obtained in large part by basing Friedrich on the multiparadigm programming language Scala. We present a case study in the form of a basic genome assembler and its extension with new functionality. Our architecture has the potential to greatly increase the overall productivity of software developers and researchers in bioinformatics.Comment: 12 pages, 1 figure, to appear in proceedings of PRIB 201

    Evolutionary conserved microRNAs are ubiquitously expressed compared to tick-specific miRNAs in the cattle tick Rhipicephalus (Boophilus) microplus

    Get PDF
    Background: MicroRNAs (miRNAs) are small non-coding RNAs that act as regulators of gene expression in eukaryotes modulating a large diversity of biological processes. The discovery of miRNAs has provided new opportunities to understand the biology of a number of species. The cattle tick, Rhipicephalus (Boophilus) microplus, causes significant economic losses in cattle production worldwide and this drives us to further understand their biology so that effective control measures can be developed. To be able to provide new insights into the biology of cattle ticks and to expand the repertoire of tick miRNAs we utilized Illumina technology to sequence the small RNA transcriptomes derived from various life stages and selected organs of R. microplus. Results: To discover and profile cattle tick miRNAs we employed two complementary approaches, one aiming to find evolutionary conserved miRNAs and another focused on the discovery of novel cattle-tick specific miRNAs. We found 51 evolutionary conserved R. microplus miRNA loci, with 36 of these previously found in the tick Ixodes scapularis. The majority of the R. microplus miRNAs are perfectly conserved throughout evolution with 11, 5 and 15 of these conserved since the Nephrozoan (640 MYA), Protostomian (620MYA) and Arthropoda (540 MYA) ancestor, respectively. We then employed a de novo computational screening for novel tick miRNAs using the draft genome of I. scapularis and genomic contigs of R. microplus as templates. This identified 36 novel R. microplus miRNA loci of which 12 were conserved in I. scapularis. Overall we found 87 R. microplus miRNA loci, of these 15 showed the expression of both miRNA and miRNA* sequences. R. microplus miRNAs showed a variety of expression profiles, with the evolutionary-conserved miRNAs mainly expressed in all life stages at various levels, while the expression of novel tick-specific miRNAs was mostly limited to particular life stages and/or tick organs. Conclusions: Anciently acquired miRNAs in the R. microplus lineage not only tend to accumulate the least amount of nucleotide substitutions as compared to those recently acquired miRNAs, but also show ubiquitous expression profiles through out tick life stages and organs contrasting with the restricted expression profiles of novel tick-specific miRNAs

    Genetic analysis of global faba bean diversity, agronomic traits and selection signatures

    Get PDF
    Faba bean (Vicia faba L.) is a high-protein grain legume crop with great potential for sustainable protein production. However, little is known about the genetics underlying trait diversity. In this study, we used 21,345 high-quality SNP markers to genetically characterize 2678 faba bean genotypes. We performed genome-wide association studies of key agronomic traits using a seven-parent-MAGIC population and detected 238 significant marker-trait associations linked to 12 traits of agronomic importance. Sixty-five of these were stable across multiple environments. Using a non-redundant diversity panel of 685 accessions from 52 countries, we identified three subpopulations differentiated by geographical origin and 33 genomic regions subjected to strong diversifying selection between subpopulations. We found that SNP markers associated with the differentiation of northern and southern accessions explained a significant proportion of agronomic trait variance in the seven-parent-MAGIC population, suggesting that some of these traits were targets of selection during breeding. Our findings point to genomic regions associated with important agronomic traits and selection, facilitating faba bean genomics-based breeding

    De novo assembly of Euphorbia fischeriana root transcriptome identifies prostratin pathway related genes

    Get PDF
    Background Euphorbia fischeriana is an important medicinal plant found in Northeast China. The plant roots contain many medicinal compounds including 12-deoxyphorbol-13-acetate, commonly known as prostratin that is a phorbol ester from the tigliane diterpene series. Prostratin is a protein kinase C activator and is effective in the treatment of Human Immunodeficiency Virus (HIV) by acting as a latent HIV activator. Latent HIV is currently the biggest limitation for viral eradication. The aim of this study was to sequence, assemble and annotate the E. fischeriana transcriptome to better understand the potential biochemical pathways leading to the synthesis of prostratin and other related diterpene compounds. Results In this study we conducted a high throughput RNA-seq approach to sequence the root transcriptome of E. fischeriana. We assembled 18,180 transcripts, of these the majority encoded protein-coding genes and only 17 transcripts corresponded to known RNA genes. Interestingly, we identified 5,956 protein-coding transcripts with high similarity (>=75%) to Ricinus communis, a close relative to E. fischeriana. We also evaluated the conservation of E. fischeriana genes against EST datasets from the Euphorbeacea family, which included R. communis, Hevea brasiliensis and Euphorbia esula. We identified a core set of 1,145 gene clusters conserved in all four species and 1,487 E. fischeriana paralogous genes. Furthermore, we screened E. fischeriana transcripts against an in-house reference database for genes implicated in the biosynthesis of upstream precursors to prostratin. This identified 24 and 9 candidate transcripts involved in the terpenoid and diterpenoid biosyntehsis pathways, respectively. The majority of the candidate genes in these pathways presented relatively low expression levels except for 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (HDS) and isopentenyl diphosphate/dimethylallyl diphosphate synthase (IDS), which are required for multiple downstream pathways including synthesis of casbene, a proposed precursor to prostratin. Conclusion The resources generated in this study provide new insights into the upstream pathways to the synthesis of prostratin and will likely facilitate functional studies aiming to produce larger quantities of this compound for HIV research and/or treatment of patients

    The giant diploid faba genome unlocks variation in a global protein crop

    Get PDF
    Publisher Copyright: © 2023, The Author(s).Increasing the proportion of locally produced plant protein in currently meat-rich diets could substantially reduce greenhouse gas emissions and loss of biodiversity1. However, plant protein production is hampered by the lack of a cool-season legume equivalent to soybean in agronomic value2. Faba bean (Vicia faba L.) has a high yield potential and is well suited for cultivation in temperate regions, but genomic resources are scarce. Here, we report a high-quality chromosome-scale assembly of the faba bean genome and show that it has expanded to a massive 13 Gb in size through an imbalance between the rates of amplification and elimination of retrotransposons and satellite repeats. Genes and recombination events are evenly dispersed across chromosomes and the gene space is remarkably compact considering the genome size, although with substantial copy number variation driven by tandem duplication. Demonstrating practical application of the genome sequence, we develop a targeted genotyping assay and use high-resolution genome-wide association analysis to dissect the genetic basis of seed size and hilum colour. The resources presented constitute a genomics-based breeding platform for faba bean, enabling breeders and geneticists to accelerate the improvement of sustainable protein production across the Mediterranean, subtropical and northern temperate agroecological zones.Peer reviewe

    Shifting the limits in wheat research and breeding using a fully annotated reference genome

    Get PDF
    Introduction: Wheat (Triticum aestivum L.) is the most widely cultivated crop on Earth, contributing about a fifth of the total calories consumed by humans. Consequently, wheat yields and production affect the global economy, and failed harvests can lead to social unrest. Breeders continuously strive to develop improved varieties by fine-tuning genetically complex yield and end-use quality parameters while maintaining stable yields and adapting the crop to regionally specific biotic and abiotic stresses. Rationale: Breeding efforts are limited by insufficient knowledge and understanding of wheat biology and the molecular basis of central agronomic traits. To meet the demands of human population growth, there is an urgent need for wheat research and breeding to accelerate genetic gain as well as to increase and protect wheat yield and quality traits. In other plant and animal species, access to a fully annotated and ordered genome sequence, including regulatory sequences and genome-diversity information, has promoted the development of systematic and more time-efficient approaches for the selection and understanding of important traits. Wheat has lagged behind, primarily owing to the challenges of assembling a genome that is more than five times as large as the human genome, polyploid, and complex, containing more than 85% repetitive DNA. To provide a foundation for improvement through molecular breeding, in 2005, the International Wheat Genome Sequencing Consortium set out to deliver a high-quality annotated reference genome sequence of bread wheat. Results: An annotated reference sequence representing the hexaploid bread wheat genome in the form of 21 chromosome-like sequence assemblies has now been delivered, giving access to 107,891 high-confidence genes, including their genomic context of regulatory sequences. This assembly enabled the discovery of tissue- and developmental stage–related gene coexpression networks using a transcriptome atlas representing all stages of wheat development. The dynamics of change in complex gene families involved in environmental adaptation and end-use quality were revealed at subgenome resolution and contextualized to known agronomic single-gene or quantitative trait loci. Aspects of the future value of the annotated assembly for molecular breeding and research were exemplarily illustrated by resolving the genetic basis of a quantitative trait locus conferring resistance to abiotic stress and insect damage as well as by serving as the basis for genome editing of the flowering-time trait. Conclusion: This annotated reference sequence of wheat is a resource that can now drive disruptive innovation in wheat improvement, as this community resource establishes the foundation for accelerating wheat research and application through improved understanding of wheat biology and genomics-assisted breeding. Importantly, the bioinformatics capacity developed for model-organism genomes will facilitate a better understanding of the wheat genome as a result of the high-quality chromosome-based genome assembly. By necessity, breeders work with the genome at the whole chromosome level, as each new cross involves the modification of genome-wide gene networks that control the expression of complex traits such as yield. With the annotated and ordered reference genome sequence in place, researchers and breeders can now easily access sequence-level information to precisely define the necessary changes in the genomes for breeding programs. This will be realized through the implementation of new DNA marker platforms and targeted breeding technologies, including genome editing

    High-resolution mapping of rachis nodes per rachis, a critical determinant of grain yield components in wheat

    No full text
    Exploring large genomic data sets based on the latest reference genome assembly identifies the rice ortholog APO1 as a key candidate gene for number of rachis nodes per spike in wheat. Increasing grain yield in wheat is a key breeding objective worldwide. Several component traits contribute to grain yield with spike attributes being among the most important. In this study, we performed a genome-wide association analysis for 12 grain yield and component traits measured in field trials with contrasting agrochemical input levels in a panel of 220 hexaploid winter wheats. A highly significant, environmentally consistent QTL was detected for number of rachis nodes per rachis (NRN) on chromosome 7AL. The five most significant SNPs formed a strong linkage disequilibrium (LD) block and tagged a 2.23\ua0Mb region. Using pairwise LD for exome SNPs located across this interval in a large worldwide hexaploid wheat collection, we reduced the genomic region for NRN to a 258\ua0Kb interval containing four of the original SNP and six high-confidence genes. The ortholog of one (TraesCS7A01G481600) of these genes in rice was ABBERANT PANICLE ORGANIZATION1 (APO1), which is known to have significant effects on panicle attributes. The APO1 ortholog was the best candidate for NRN and was associated with a 115\ua0bp promoter deletion and two amino acid (C47F and D384\ua0N) changes. Using a large worldwide collection of tetraploid and hexaploid wheat, we found 12 haplotypes for the NRN QTL and evidence for positive enrichment of two haplotypes in modern germplasm. Comparison of five QTL haplotypes in Australian yield trials revealed their relative, context-dependent contribution to grain yield. Our study provides diagnostic SNPs and value propositions to support deployment of the NRN trait in wheat breeding

    Optical maps refine the bread wheat Triticum aestivum cv. Chinese Spring genome assembly

    No full text
    International audienceUntil recently, achieving a reference-quality genome sequence for bread wheat was long thought beyond the limits of genome sequencing and assembly technology, primarily due to the large genome size and > 80% repetitive sequence content. The release of the chromosome scale 14.5-Gb IWGSC RefSeq v1.0 genome sequence of bread wheat cv. Chinese Spring (CS) was, therefore, a milestone. Here, we used a direct label and stain (DLS) optical map of the CS genome together with a prior nick, label, repair and stain (NLRS) optical map, and sequence contigs assembled with Pacific Biosciences long reads, to refine the v1.0 assembly. Inconsistencies between the sequence and maps were reconciled and gaps were closed. Gap filling and anchoring of 279 unplaced scaffolds increased the total length of pseudomolecules by 168 Mb (excluding Ns). Positions and orientations were corrected for 233 and 354 scaffolds, respectively, representing 10% of the genome sequence. The accuracy of the remaining 90% of the assembly was validated. As a result of the increased contiguity, the numbers of transposable elements (TEs) and intact TEs have increased in IWGSC RefSeq v2.1 compared with v1.0. In total, 98% of the gene models identified in v1.0 were mapped onto this new assembly through development of a dedicated approach implemented in the MAGAAT pipeline. The numbers of high-confidence genes on pseudomolecules have increased from 105 319 to 105 534. The reconciled assembly enhances the utility of the sequence for genetic mapping, comparative genomics, gene annotation and isolation, and more general studies on the biology of wheat

    Genome mapping of seed-borne allergens and immunoresponsive proteins in wheat

    Get PDF
    Wheat is an important staple grain for humankind globally because of its end-use quality and nutritional properties and its adaptability to diverse climates. For a small proportion of the population, specific wheat proteins can trigger adverse immune responses and clinical manifestations such as celiac disease, wheat allergy, baker’s asthma, and wheat-dependent exercise-induced anaphylaxis (WDEIA). Establishing the content and distribution of the immunostimulatory regions in wheat has been hampered by the complexity of the wheat genome and the lack of complete genome sequence information. We provide novel insights into the wheat grain proteins based on a comprehensive analysis and annotation of the wheat prolamin Pfam clan grain proteins and other non-prolamin allergens implicated in these disorders using the new International Wheat Genome Sequencing Consortium bread wheat reference genome sequence, RefSeq v1.0. Celiac disease and WDEIA genes are primarily expressed in the starchy endosperm and show wide variation in protein- and transcript-level expression in response to temperature stress. Nonspecific lipid transfer proteins and α-amylase trypsin inhibitor gene families, implicated in baker’s asthma, are primarily expressed in the aleurone layer and transfer cells of grains and are more sensitive to cold temperature. The study establishes a new reference map for immunostimulatory wheat proteins and provides a fresh basis for selecting wheat lines and developing diagnostics for products with more favorable consumer attributes
    corecore